Favorite Artist

code
data-analysis
Author

Jake Starkey

Published

March 6, 2024

Favorite Artist

The purpose of this blog is to give reader’s insight into my favorite artist’s within the Spotify DataFrame

Below is the spotify DataFrame that reads the file spotify_all.csv containing data of Spotify users’ playlist information (Source: Spotify Million Playlist Dataset Challenge)..

import pandas as pd
spotify = pd.read_csv('https://bcdanl.github.io/data/spotify_all.csv')
spotify
pid playlist_name pos artist_name track_name duration_ms album_name
0 0 Throwbacks 0 Missy Elliott Lose Control (feat. Ciara & Fat Man Scoop) 226863 The Cookbook
1 0 Throwbacks 1 Britney Spears Toxic 198800 In The Zone
2 0 Throwbacks 2 Beyoncé Crazy In Love 235933 Dangerously In Love (Alben für die Ewigkeit)
3 0 Throwbacks 3 Justin Timberlake Rock Your Body 267266 Justified
4 0 Throwbacks 4 Shaggy It Wasn't Me 227600 Hot Shot
... ... ... ... ... ... ... ...
198000 999998 ✝️ 6 Chris Tomlin Waterfall 209573 Love Ran Red
198001 999998 ✝️ 7 Chris Tomlin The Roar 220106 Love Ran Red
198002 999998 ✝️ 8 Crowder Lift Your Head Weary Sinner (Chains) 224666 Neon Steeple
198003 999998 ✝️ 9 Chris Tomlin We Fall Down 280960 How Great Is Our God: The Essential Collection
198004 999998 ✝️ 10 Caleb and Kelsey 10,000 Reasons / What a Beautiful Name 178189 10,000 Reasons / What a Beautiful Name

198005 rows × 7 columns

Variable Discription

  • pid: playlist ID; unique ID for playlist
  • playlist_name: a name of playlist
  • pos: a position of the track within a playlist (starting from 0)
  • artist_name: name of the track’s primary artist
  • track_name: name of the track
  • duration_ms: duration of the track in milliseconds
  • album_name: name of the track’s album ## Occurances
artist_count = spotify['artist_name'].value_counts()
artist_count
Drake                2715
Kanye West           1065
Kendrick Lamar       1035
Rihanna               915
The Weeknd            913
                     ... 
Luna City Express       1
Ninetoes                1
Rhemi                   1
Jamie 3:26              1
Caleb and Kelsey        1
Name: artist_name, Length: 18866, dtype: int64
  • The above code counts the occurences of each artist as you can see, Kanye West appears the most in playlists and he is my favorite artist.

Favorite Artist Data Frame

favorite_artists = spotify[spotify['artist_name'].isin(['Kanye West', '21 Savage','Drake'])]
favorite_artists
pid playlist_name pos artist_name track_name duration_ms album_name
522 10 abby 8 Drake Portland 236614 More Life
544 10 abby 30 Drake Preach 236973 If You're Reading This It's Too Late
570 10 abby 56 Drake Headlines 235986 Take Care
638 11 VIBE 52 Drake Houstatlantavegas 290426 So Far Gone
639 11 VIBE 53 Drake Runnin Away For Good 292666 The Drake LP
... ... ... ... ... ... ... ...
197564 999990 Drake 0 Drake One Dance 173986 Views
197565 999990 Drake 1 Drake Fake Love 210937 More Life
197566 999990 Drake 2 Drake Pop Style 212946 Views
197567 999990 Drake 3 Drake Hotline Bling 267066 Views
197568 999990 Drake 4 Drake Legend 241853 If You're Reading This It's Too Late

4124 rows × 7 columns

  • The above code filters the DataFrame to show only the songs by my three favorite artists: Drake, Kanye West, and 21 Savage

Favorite Artists Occurances

favorite_count = favorite_artists['artist_name'].value_counts()
favorite_count
Drake         2715
Kanye West    1065
21 Savage      344
Name: artist_name, dtype: int64
  • This code shows the number of occurrences that my three favorite artists have in the data set

Favorite Artist’s Track Duration

sorted_fav_artists = favorite_artists.sort_values(by = 'duration_ms', ascending = False)
no_duplicates = sorted_fav_artists.drop_duplicates(subset=['artist_name', 'track_name'])
longest_tracks = no_duplicates[['artist_name', 'track_name', 'duration_ms']].head(10)
longest_tracks
artist_name track_name duration_ms
67145 Kanye West Last Call 760973
119752 Kanye West Runaway 547733
123616 Kanye West Blame Game 469866
123594 Drake Cameras / Good Ones Go Interlude - Medley 434960
47541 Drake Pound Cake / Paris Morton Music 2 433800
138846 21 Savage 7 Min Freestyle 431586
65122 Drake Shut It Down 419306
162482 Kanye West So Appalled 397666
131325 Drake Uptown 381240
55630 Kanye West Monster 378893
  • The above code sorts the DateFrame containing only tracks from my favorite artists by the length of their songs
  • It then gets rid of any duplicates so I only see one of each song
  • Then I see the top 10 longest tracks by my three favorite artists
  • Kanye Takes the three longest songs out of my favorite artists.

Favorite Artist’s Average Position

fav_artist_name = favorite_artists['artist_name'].unique()
artist_tracks = spotify[spotify['artist_name'].isin(fav_artist_name)]
avg_pos = artist_tracks.groupby('artist_name')['pos'].mean()
avg_pos
artist_name
21 Savage     57.776163
Drake         57.143278
Kanye West    51.292958
Name: pos, dtype: float64
  • The above code gathers the average position of tracks within a playlist for each of my three favorite artists within the Spotify DataFrame